refactor(ltm): redesign long-term memory with append-only incremental contexts#8144
refactor(ltm): redesign long-term memory with append-only incremental contexts#8144RC-CHN wants to merge 55 commits into
Conversation
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The MAX_* limits (MAX_MSGS_PER_USER_SEGMENT, MAX_CHARS_PER_USER_SEGMENT, MAX_RAW_BYTES) are currently hard-coded; consider wiring these through configuration (e.g., provider_ltm_settings) so different deployments or groups can tune memory usage and retention behavior without code changes.
- In _trim_raw_records, total is recomputed by summing len(s.encode()) on every call, which is O(n); if this runs frequently on busy groups, consider tracking a running byte-size counter per umo to avoid repeatedly traversing the deque.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The MAX_* limits (MAX_MSGS_PER_USER_SEGMENT, MAX_CHARS_PER_USER_SEGMENT, MAX_RAW_BYTES) are currently hard-coded; consider wiring these through configuration (e.g., provider_ltm_settings) so different deployments or groups can tune memory usage and retention behavior without code changes.
- In _trim_raw_records, total is recomputed by summing len(s.encode()) on every call, which is O(n); if this runs frequently on busy groups, consider tracking a running byte-size counter per umo to avoid repeatedly traversing the deque.
## Individual Comments
### Comment 1
<location path="astrbot/builtin_stars/astrbot/long_term_memory.py" line_range="290-299" />
<code_context>
+ # 裁剪
+ # =========================================================================
+
+ def _trim_raw_records(self, umo: str) -> None:
+ """仅淘汰 cursor 之前的条目。cursor 之后的绝不碰(issue #2)。"""
+ dq = self.raw_records[umo]
+ cursor = self._raw_cursor[umo]
+
+ # 1. 无条件清除 cursor 之前的条目(已消费)
+ while dq and cursor > 0:
+ dq.popleft()
+ cursor -= 1
+ self._raw_cursor[umo] = cursor
+
+ # 2. 按大小继续从前面淘汰(限制极端情况的总内存)
+ total = sum(len(s.encode()) for s in dq)
+ while total > MAX_RAW_BYTES and dq and cursor > 0:
+ removed = dq.popleft()
+ total -= len(removed.encode())
</code_context>
<issue_to_address>
**issue (bug_risk):** Size-based trimming branch is effectively dead due to cursor reset logic.
In `_trim_raw_records`, the first loop always decrements `cursor` to 0 and then writes it back to `self._raw_cursor[umo]`. As a result, in the size-based loop `while total > MAX_RAW_BYTES and dq and cursor > 0:`, `cursor` is always 0 and the loop never runs, so `MAX_RAW_BYTES` is never enforced.
To preserve the intended behavior (always drop fully-consumed entries, and then optionally drop additional consumed entries to satisfy `MAX_RAW_BYTES`), you’ll need to decouple the notion of “consumed index” from the deque length. For example, track how many entries are removed in the first loop and use that to derive which entries are safe to drop in the size-based phase, rather than relying on `cursor > 0` after the first loop.
</issue_to_address>
### Comment 2
<location path="tests/unit/test_long_term_memory.py" line_range="207-216" />
<code_context>
+ def test_tool_call_then_result_then_bot(self):
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for `_build_segments` when a tool result appears without a preceding tool call.
Current `_build_segments` tests only cover well-formed tool flows (`<T:CALL>` → `<T:RES>` → `<BOT>`). Please add a case where a `<T:RES>` appears without a prior `<T:CALL>`, e.g.:
```python
def test_tool_result_without_call_then_bot(self):
lines = [
"<T:RES id=orphan>data</T:RES>",
"<BOT/14:30>: ok",
]
result = _build_segments(lines)
# assert behavior: either a valid tool segment or clean ignore, no exception,
# and an intact assistant segment.
```
This helps ensure `_build_segments` behaves predictably with partial or inconsistent histories.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request introduces Long Term Memory (LTM) v2, which significantly improves chatroom memory management by implementing incremental context building, support for tool-call history, and memory-efficient message tracking using deques and cursors. The changes also include a fallback mechanism for LLM compression and extensive unit tests. Feedback focuses on several critical areas: a potential memory leak in the contexts dictionary which is currently append-only, a logic error in the size-based trimming of raw records that renders some code unreachable, and the risk of KeyError crashes when parsing malformed tool-call records. Additionally, there is a discrepancy between the system prompt's description of bot message markers and the actual role-based formatting sent to the LLM.
| self.contexts: dict[str, list[dict]] = defaultdict(list) | ||
| """累积累积态 LLM 上下文。由 ContextManager 修改后保留。""" |
There was a problem hiding this comment.
The self.contexts dictionary is append-only and never pruned. In long-running sessions or active group chats, this will lead to a memory leak as the list of segments grows indefinitely. While append-only contexts help with KV cache hits, you should still implement a maximum context length (e.g., based on the provider's window or a safe segment count) to prevent unbounded memory growth.
| async with self._lock: | ||
| umo = event.unified_msg_origin | ||
|
|
||
| # 记录写入前索引 → on_req_llm 精确排除(issue #1, #9) | ||
| raw_idx = len(self.raw_records[umo]) | ||
| event.set_extra("_ltm_raw_idx", raw_idx) |
There was a problem hiding this comment.
handle_message appends to raw_records but never triggers trimming. In groups that rarely interact with the bot, raw_records will grow indefinitely because _trim_raw_records is only called during an agent run. Trimming should be performed here (before calculating raw_idx) to ensure memory usage remains bounded. Note that since this logic is synchronous and does not contain 'await' calls, it is executed atomically in the asyncio event loop and does not require an explicit lock.
umo = event.unified_msg_origin
self._trim_raw_records(umo)
# 记录写入前索引 → on_req_llm 精确排除(issue #1, #9)
raw_idx = len(self.raw_records[umo])
event.set_extra("_ltm_raw_idx", raw_idx)References
- In a single-threaded asyncio event loop, synchronous functions (code blocks without 'await') are executed atomically and will not be interrupted by other coroutines. Therefore, they are safe from race conditions when modifying shared state within that block.
…ol call re-persistence & add truncate tool
…strategies exclusive
…records_max_bytes
…8153) * chore: streamline convert_audio_to_opus logic - Route Opus conversion directly through the underlying convert_audio_format. - Remove redundant FFmpeg processing chains to improve code reusability. * perf: optimize AMR voice encoding parameters - Enhance AMR audio quality via built-in FFmpeg filters.
AstrBotDevs#8136) * fix: handle None tool arguments returned by Claude API for no-parameter tools * fix: handle None tool arguments from Claude API for no-parameter tools * fix: generalize None tool args comment * fix: generalize None tool args comment * 去除空格,以保证格式正确
* fix: add ollama and nvidia embedding * fix: address code review feedback for embedding providers - Remove redundant proxy branch in NvidiaEmbeddingProvider._get_client - Change ClientError handling to re-raise instead of wrapping in Exception - Add exc_info=True for better error diagnostics - Remove redundant isinstance check in OllamaEmbeddingProvider._build_payload
* fix: surface weixin media send failures * fix: include weixin send failure context * Delete tests/unit/test_weixin_oc_adapter.py --------- Co-authored-by: Weilong Liao <37870767+Soulter@users.noreply.github.com>
) * feat(lark): implement app registration and bot info retrieval - Add app registration functionality for Lark and Feishu platforms, including endpoints and request handling. - Introduce polling mechanism for app registration status. - Create bot info retrieval functionality to fetch bot details after successful registration. - Enhance dashboard with new UI components for one-click QR setup and manual setup options. - Update internationalization files to support new features and actions. - Add unit tests for app registration endpoint resolution and data handling. * feat(weixin_oc): add WeChat login registration and QR code handling
…avoid crashes on invalid or empty values * fix: add comments and await asyncio.sleep(0) for startup signal * fix: [Bug] 修复 MiniMax TTS 空字符串配置解析报错 * fix: 采纳AI审查建议,添日志+提取默认配置变量 * fix: 移除误加的core_lifecycle.py改动 --------- Co-authored-by: RainBot-Ai <qianlanzhiya@gmail.com>
…#8015) The WebUI only loaded Noto Sans SC (Simplified Chinese), which lacks Cyrillic glyphs. Russian text fell back to system sans-serif, causing poor rendering depending on the OS. Changes: - Load Noto Sans (regular) from Google Fonts alongside Noto Sans SC - Add 'Noto Sans' at the END of $cjk-sans-fallback (after CJK fonts) so Chinese text still renders with system CJK fonts first, while Cyrillic text falls through to Noto Sans. This ensures both Chinese and Cyrillic text render correctly.
…ffmpeg failure (AstrBotDevs#8009) * fix: detect Tencent SILK (\x02 prefix) in audio magic bytes to avoid ffmpeg failure QQ official bot sends voice in Tencent SILK format (leading \x02 byte before #!SILK_V3 magic). _get_audio_magic_type() had two off-by-one slice errors: 1. Standard SILK: header[:8] vs b'#!SILK_V3' (8 != 9 bytes) — never matched 2. Tencent SILK: not detected at all Fixes: - Standard SILK: header[:9] == b'#!SILK_V3' (correct 9-byte slice) - Tencent SILK: header[:1] == b"\x02" and header[1:10] == b'#!SILK_V3' - ensure_wav() routes detected silk to tencent_silk_to_wav() Before: QQ voice → ffmpeg → 'Invalid data found' After: QQ voice → magic detects silk → tencent_silk_to_wav → WAV OK * refactor: use startswith() for SILK magic byte detection Replace manual slice comparisons with startswith() — cleaner, less error-prone, and immune to off-by-one slice errors. Suggested by: sourcery-ai
* fix(core): pass images through active replies * fix: harden active reply image collection * test: avoid logger coupling in active reply test * Delete tests/unit/test_builtin_astrbot_main.py --------- Co-authored-by: Weilong Liao <37870767+Soulter@users.noreply.github.com>
…xt (AstrBotDevs#8205) PR AstrBotDevs#8015 added 'Noto Sans' to the Google Fonts link and CJK fallback list, but the font was placed at the end of $cjk-sans-fallback where browsers never reach it for Cyrillic text. The global $body-font-family also lacked 'Outfit' entirely, causing Vuetify to use CJK fonts as the primary face. Changes: - Remove 'Noto Sans' from the end of $cjk-sans-fallback (it is not a CJK font) - Add 'Outfit' and 'Noto Sans' to $body-font-family before CJK fallbacks - Update .Outfit class in _container.scss to match the new stack This ensures: - Latin text → Outfit - Cyrillic text → Noto Sans (loaded by vite-plugin-webfont-dl) - CJK text → Noto Sans SC / PingFang SC etc. Fixes follow-up to AstrBotDevs#8015.
) * feat(lark): implement app registration and bot info retrieval - Add app registration functionality for Lark and Feishu platforms, including endpoints and request handling. - Introduce polling mechanism for app registration status. - Create bot info retrieval functionality to fetch bot details after successful registration. - Enhance dashboard with new UI components for one-click QR setup and manual setup options. - Update internationalization files to support new features and actions. - Add unit tests for app registration endpoint resolution and data handling. * feat(weixin_oc): add WeChat login registration and QR code handling
…#8015) The WebUI only loaded Noto Sans SC (Simplified Chinese), which lacks Cyrillic glyphs. Russian text fell back to system sans-serif, causing poor rendering depending on the OS. Changes: - Load Noto Sans (regular) from Google Fonts alongside Noto Sans SC - Add 'Noto Sans' at the END of $cjk-sans-fallback (after CJK fonts) so Chinese text still renders with system CJK fonts first, while Cyrillic text falls through to Noto Sans. This ensures both Chinese and Cyrillic text render correctly.
…xt (AstrBotDevs#8205) PR AstrBotDevs#8015 added 'Noto Sans' to the Google Fonts link and CJK fallback list, but the font was placed at the end of $cjk-sans-fallback where browsers never reach it for Cyrillic text. The global $body-font-family also lacked 'Outfit' entirely, causing Vuetify to use CJK fonts as the primary face. Changes: - Remove 'Noto Sans' from the end of $cjk-sans-fallback (it is not a CJK font) - Add 'Outfit' and 'Noto Sans' to $body-font-family before CJK fallbacks - Update .Outfit class in _container.scss to match the new stack This ensures: - Latin text → Outfit - Cyrillic text → Noto Sans (loaded by vite-plugin-webfont-dl) - CJK text → Noto Sans SC / PingFang SC etc. Fixes follow-up to AstrBotDevs#8015.
Motivation
Fixes #8080
Rewrite the long-term memory (LTM) module from a ring buffer to an append-only architecture that keeps context prefixes stable across requests — enabling KV cache hits and the associated cost discounts (typically 1/10 of standard pricing across OpenAI, Anthropic, DeepSeek, and cloud providers).
Modifications / 改动点
Core:
astrbot/builtin_stars/astrbot/long_term_memory.pymax_cntring buffer withraw_records(deque) +_raw_cursor+contexts(append-only list). Old segments are never rebuilt._build_segments()converts raw chat lines into OpenAI-format context segments, handling tool calls, parallel tools, and multi-step chains.<BOT/>markers replace[You/]to avoid nickname collisions.on_agent_donerecords tool-call chains and now includes the @bot prompt in contexts so future rounds see the user's original message.asyncio.Lockfor concurrency safety;remove_session()for cleanup.Hook wiring:
astrbot/builtin_stars/astrbot/main.py@on_llm_response→@on_agent_donefor accurate tool-chain recording.group_icl_enable=trueskips Conversation DB query (conversation=None).Config:
astrbot/builtin_stars/astrbot/default.pycontext_limit_reached_strategy→"llm_compress".Agent runner:
astrbot/core/astr_main_agent.py_get_compress_providerauto-falls back to the main chat provider whenllm_compress_provider_idis unset, preventing silent truncation.Tests:
tests/unit/test_long_term_memory.py(new, 47 tests)Pure functions: extract, parse, truncate, build_segments (31 tests).
Integration: round-trip lifecycle, multi-round accumulation, tool chains, persona preservation, concurrent safety (16 tests).
This is NOT a breaking change. / 这不是一个破坏性变更。
Screenshots or Test Results / 运行截图或测试结果
Tested on personal self-hosted astrbot.

Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Refactor the long-term memory subsystem to use an append-only, incremental context architecture and integrate it with agent completion hooks, while improving default compression behavior and regression coverage.
Enhancements:
Tests:
Summary by Sourcery
Refactor group long-term memory to an append-only, incrementally built context model integrated with agent completion hooks, while tightening context compression behavior and isolating request-time context guarding from persistent history management.
Enhancements:
Tests: